arbitrary bounded noise
Efficient active learning of sparse halfspaces with arbitrary bounded noise
We study active learning of homogeneous $s$-sparse halfspaces in $\mathbb{R}^d$ under the setting where the unlabeled data distribution is isotropic log-concave and each label is flipped with probability at most $\eta$ for a parameter $\eta \in \big[0, \frac12\big)$, known as the bounded noise. Even in the presence of mild label noise, i.e. $\eta$ is a small constant, this is a challenging problem and only recently have label complexity bounds of the form $\tilde{O}(s \cdot polylog(d, \frac{1}{\epsilon}))$ been established in [Zhang 2018] for computationally efficient algorithms. In contrast, under high levels of label noise, the label complexity bounds achieved by computationally efficient algorithms are much worse: the best known result [Awasthi et al. 2016] provides a computationally efficient algorithm with label complexity $\tilde{O}((s ln d/\epsilon)^{poly(1/(1-2\eta))})$, which is label-efficient only when the noise rate $\eta$ is a fixed constant.
Review for NeurIPS paper: Efficient active learning of sparse halfspaces with arbitrary bounded noise
Summary and Contributions: This paper studies the problem of learning sparse halfspaces given access to a noisy point-label pair oracle. In particular, given underlying true halfspace h *, the goal is to recover an \epsilon accurate sparse representation h' of h * using minimum number of noisy-oracle queries. The paper makes the following standard assumptions (i) the underlying distribution over the points is log-concave isotropic (ii) The label noise model is Massart noise. Under these assumptions, the paper gives an efficient algorithm which \epsilon learns halfspaces using O(s/(1 - 2\eta) 4 \poly-log(d,\epsilon)) samples, making it the first linear in s-sample complexity algorithm in this setting. This is also known to be almost information theoretically optimal with the upper and lower bounds differing only by a factor of O(1/(1 - 2\eta) 2).
Review for NeurIPS paper: Efficient active learning of sparse halfspaces with arbitrary bounded noise
All reviewers agree that this paper made a solid contribution in the context of active learning of sparse halfspaces (in the Massart noise model). The sample complexity bound amounts to a major improvement over best known results for learning of halfspaces, which warrant acceptance of the paper. For camera-ready, the authors are encouraged to take into account the reviewer's feedback to further improve the discussion of the proposed algorithm (In particular, please address the concern on lack of intuition why a mirror descent approach leads to such large improvements in this space compared to earlier methods).
Efficient active learning of sparse halfspaces with arbitrary bounded noise
We study active learning of homogeneous s -sparse halfspaces in \mathbb{R} d under the setting where the unlabeled data distribution is isotropic log-concave and each label is flipped with probability at most \eta for a parameter \eta \in \big[0, \frac12\big), known as the bounded noise. Even in the presence of mild label noise, i.e. \eta is a small constant, this is a challenging problem and only recently have label complexity bounds of the form \tilde{O}(s \cdot polylog(d, \frac{1}{\epsilon})) been established in [Zhang 2018] for computationally efficient algorithms. In contrast, under high levels of label noise, the label complexity bounds achieved by computationally efficient algorithms are much worse: the best known result [Awasthi et al. 2016] provides a computationally efficient algorithm with label complexity \tilde{O}((s ln d/\epsilon) {poly(1/(1-2\eta))}), which is label-efficient only when the noise rate \eta is a fixed constant. In this work, we substantially improve on it by designing a polynomial time algorithm for active learning of s -sparse halfspaces, with a label complexity of \tilde{O}\big(\frac{s}{(1-2\eta) 4} polylog (d, \frac 1 \epsilon) \big) . This is the first efficient algorithm with label complexity polynomial in \frac{1}{1-2\eta} in this setting, which is label-efficient even for \eta arbitrarily close to \frac12 .